Members: Can Balkose, Zep Van Boxtel, Stan Vos, Julia Michels
Student numbers: 6068383 , 4903684 , 4725603 , 4996569
Requires data modeling and quantitative research in Transport, Infrastructure & Logistics
Research Question:
Effect of COVID on the transportation usage and mode of choice on different regions and demographics in the Netherlands.
Objectives
-To analyze and visualize the impact of the COVID-19 pandemic on transportation usage and mode choice in different regions within the Netherlands.
-To provide insights into how government policies, and public sentiment influenced transportation trends during the pandemic.
-What were the key demographic factors influencing transportation mode choice during the pandemic?
-Understanding how urban and rural cities were affected differently from the pandemic on transportation usage and mode of transportation
-To understand the change of behavior in different demographics on transportation after the pandemic. Coming up with a conclusion on the potential long-term impacts on transportation behavior post-pandemic
Be specific. Some of the tasks can be coding (expect everyone to do this), background research, conceptualisation, visualisation, data analysis, data modelling
Author 1:
Author 2:
Author 3:
import pandas as pd
from scipy.signal import find_peaks
from scipy.signal import argrelextrema
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
#data of distance covered in regions based on urbanization of region and mode of transport
data_distance_mode_urban= 'data/Distance_covered_on_different_urban_areas.csv'
df_urbanization_mode_urban = pd.read_csv(data_distance_mode_urban)
df_urbanization_mode_urban
| Year | Region | Mode of Transport | Total Distance (billion km) | |
|---|---|---|---|---|
| 0 | 2018 | Extremely urbanized | Combined | 46.8 |
| 1 | 2019 | Extremely urbanized | Combined | 46.2 |
| 2 | 2020 | Extremely urbanized | Combined | 31.4 |
| 3 | 2021 | Extremely urbanized | Combined | 36.3 |
| 4 | 2022 | Extremely urbanized | Combined | 40.9 |
| ... | ... | ... | ... | ... |
| 195 | 2018 | Not urbanized | Other | 2.0 |
| 196 | 2019 | Not urbanized | Other | 2.1 |
| 197 | 2020 | Not urbanized | Other | 1.5 |
| 198 | 2021 | Not urbanized | Other | 1.4 |
| 199 | 2022 | Not urbanized | Other | 1.3 |
200 rows × 4 columns
#data of usage of public transportation in different demographics
data_usage_of_public_transport= 'data/Usage_of_public_transportation.csv'
df_usage_of_public_transport = pd.read_csv(data_usage_of_public_transport)
df_usage_of_public_transport
| Demographic | Year | Usage of public transportation (%) | |
|---|---|---|---|
| 0 | Age: 12 to 17 years | 2018 | 11.7 |
| 1 | Age: 12 to 17 years | 2019 | 10.7 |
| 2 | Age: 12 to 17 years | 2020 | 6.3 |
| 3 | Age: 12 to 17 years | 2021 | 6.3 |
| 4 | Age: 12 to 17 years | 2022 | 9.2 |
| ... | ... | ... | ... |
| 100 | No driver's license; 17 years or older | 2018 | 17.5 |
| 101 | No driver's license; 17 years or older | 2019 | 16.3 |
| 102 | No driver's license; 17 years or older | 2020 | 8.5 |
| 103 | No driver's license; 17 years or older | 2021 | 9.8 |
| 104 | No driver's license; 17 years or older | 2022 | 13.7 |
105 rows × 3 columns
#the amount of traffix on dutch highway on weekdays and weekends compared to 2019 (2019 = 100)
data_traffic_highways = 'data/CBS Dutch highway traffic.csv'
df_data_traffic_highways = pd.read_csv(data_traffic_highways)
df_data_traffic_highways = df_data_traffic_highways.iloc[:-3]
df_data_traffic_highways
| Week | Doordeweeks, 2020 (2019 = 100) | In het weekeinde, 2020 (2019 = 100) | Doordeweeks, 2021 (2019 = 100) | In het weekeinde, 2021 (2019 = 100) | Doordeweeks, 2022 (2019 = 100) | In het weekeinde, 2022 (2019 = 100) | Doordeweeks, 2023 (2019 = 100) | In het weekeinde, 2023 (2019 = 100) | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 83.0 | 101.0 | 71.0 | 67.0 | 96.0 | 82.0 | 103.0 | 99.0 |
| 1 | 2 | 99.0 | 102.0 | 79.0 | 64.0 | 86.0 | 86.0 | 93.0 | 98.0 |
| 2 | 3 | 100.0 | 102.0 | 77.0 | 65.0 | 85.0 | 84.0 | 91.0 | 95.0 |
| 3 | 4 | 104.0 | 106.0 | 78.0 | 67.0 | 88.0 | 91.0 | 98.0 | 103.0 |
| 4 | 5 | 102.0 | 103.0 | 78.0 | 48.0 | 87.0 | 86.0 | 95.0 | 100.0 |
| 5 | 6 | 99.0 | 88.0 | 62.0 | 61.0 | 87.0 | 91.0 | 92.0 | 99.0 |
| 6 | 7 | 97.0 | 90.0 | 73.0 | 68.0 | 82.0 | 82.0 | 93.0 | 85.0 |
| 7 | 8 | 99.0 | 87.0 | 80.0 | 68.0 | 88.0 | 89.0 | 92.0 | 95.0 |
| 8 | 9 | 94.0 | 105.0 | 78.0 | 74.0 | 86.0 | 92.0 | 91.0 | 104.0 |
| 9 | 10 | 98.0 | 99.0 | 80.0 | 63.0 | 88.0 | 89.0 | 93.0 | 96.0 |
| 10 | 11 | 91.0 | 67.0 | 80.0 | 71.0 | 88.0 | 91.0 | 94.0 | 99.0 |
| 11 | 12 | 60.0 | 38.0 | 80.0 | 68.0 | 89.0 | 92.0 | 93.0 | 91.0 |
| 12 | 13 | 51.0 | 33.0 | 80.0 | 67.0 | 87.0 | 85.0 | 92.0 | 93.0 |
| 13 | 14 | 52.0 | 33.0 | 77.0 | 60.0 | 88.0 | 84.0 | 95.0 | 88.0 |
| 14 | 15 | 52.0 | 35.0 | 76.0 | 65.0 | 90.0 | 87.0 | 89.0 | 94.0 |
| 15 | 16 | 47.0 | 39.0 | 76.0 | 67.0 | 86.0 | 92.0 | 89.0 | 92.0 |
| 16 | 17 | 58.0 | 53.0 | 75.0 | 85.0 | 84.0 | 107.0 | 85.0 | 113.0 |
| 17 | 18 | 56.0 | 49.0 | 81.0 | 80.0 | 89.0 | 103.0 | 91.0 | 99.0 |
| 18 | 19 | 61.0 | 57.0 | 77.0 | 75.0 | 93.0 | 93.0 | 94.0 | 96.0 |
| 19 | 20 | 66.0 | 57.0 | 83.0 | 74.0 | 89.0 | 91.0 | 87.0 | 100.0 |
| 20 | 21 | 64.0 | 61.0 | 78.0 | 80.0 | 86.0 | 97.0 | 93.0 | 96.0 |
| 21 | 22 | 78.0 | 62.0 | 89.0 | 75.0 | 98.0 | 83.0 | 97.0 | 87.0 |
| 22 | 23 | 73.0 | 70.0 | 84.0 | 88.0 | 88.0 | 98.0 | 93.0 | 104.0 |
| 23 | 24 | 81.0 | 73.0 | 87.0 | 84.0 | 94.0 | 95.0 | 94.0 | 96.0 |
| 24 | 25 | 81.0 | 82.0 | 86.0 | 84.0 | 90.0 | 92.0 | 91.0 | 97.0 |
| 25 | 26 | 84.0 | 81.0 | 87.0 | 88.0 | 91.0 | 93.0 | 91.0 | 93.0 |
| 26 | 27 | 86.0 | 84.0 | 88.0 | 90.0 | 90.0 | 93.0 | 91.0 | 95.0 |
| 27 | 28 | 86.0 | 89.0 | 86.0 | 87.0 | 91.0 | 94.0 | 93.0 | 95.0 |
| 28 | 29 | 89.0 | 95.0 | 88.0 | 88.0 | 89.0 | 94.0 | 92.0 | 95.0 |
| 29 | 30 | 93.0 | 95.0 | 89.0 | 88.0 | 93.0 | 97.0 | NaN | NaN |
| 30 | 31 | 93.0 | 91.0 | 88.0 | 87.0 | 92.0 | 93.0 | NaN | NaN |
| 31 | 32 | 91.0 | 90.0 | 90.0 | 94.0 | 91.0 | 93.0 | NaN | NaN |
| 32 | 33 | 88.0 | 91.0 | 90.0 | 96.0 | 90.0 | 98.0 | NaN | NaN |
| 33 | 34 | 90.0 | 84.0 | 91.0 | 87.0 | 92.0 | 92.0 | NaN | NaN |
| 34 | 35 | 90.0 | 86.0 | 91.0 | 90.0 | 94.0 | 91.0 | NaN | NaN |
| 35 | 36 | 92.0 | 92.0 | 94.0 | 90.0 | 94.0 | 93.0 | NaN | NaN |
| 36 | 37 | 90.0 | 92.0 | 92.0 | 96.0 | 92.0 | 89.0 | NaN | NaN |
| 37 | 38 | 92.0 | 89.0 | 94.0 | 93.0 | 91.0 | 91.0 | NaN | NaN |
| 38 | 39 | 89.0 | 74.0 | 93.0 | 93.0 | 91.0 | 80.0 | NaN | NaN |
| 39 | 40 | 88.0 | 75.0 | 96.0 | 97.0 | 96.0 | 94.0 | NaN | NaN |
| 40 | 41 | 82.0 | 76.0 | 92.0 | 98.0 | 91.0 | 93.0 | NaN | NaN |
| 41 | 42 | 81.0 | 70.0 | 93.0 | 98.0 | 91.0 | 94.0 | NaN | NaN |
| 42 | 43 | 79.0 | 68.0 | 92.0 | 90.0 | 94.0 | 97.0 | NaN | NaN |
| 43 | 44 | 77.0 | 66.0 | 90.0 | 85.0 | 91.0 | 86.0 | NaN | NaN |
| 44 | 45 | 78.0 | 68.0 | 90.0 | 83.0 | 93.0 | 94.0 | NaN | NaN |
| 45 | 46 | 79.0 | 69.0 | 85.0 | 83.0 | 94.0 | 96.0 | NaN | NaN |
| 46 | 47 | 81.0 | 72.0 | 86.0 | 80.0 | 93.0 | 97.0 | NaN | NaN |
| 47 | 48 | 82.0 | 75.0 | 81.0 | 76.0 | 91.0 | 88.0 | NaN | NaN |
| 48 | 49 | 83.0 | 73.0 | 85.0 | 80.0 | 92.0 | 95.0 | NaN | NaN |
| 49 | 50 | 82.0 | 72.0 | 85.0 | 79.0 | 92.0 | 90.0 | NaN | NaN |
| 50 | 51 | 76.0 | 63.0 | 78.0 | 72.0 | 87.0 | 79.0 | NaN | NaN |
| 51 | 52 | 83.0 | 59.0 | 89.0 | 61.0 | 105.0 | 96.0 | NaN | NaN |
| 52 | 53 | 86.0 | 65.0 | NaN | NaN | NaN | NaN | NaN | NaN |
Mobility; per person, personal characteristics, travel purposes and regions
https://opendata.cbs.nl/statline/#/CBS/en/dataset/84687ENG/table?dl=97AA7
This data is yet to be added
#filter out the rows where mode of transport is 'combined'
filtered_df_urbanization_mode_urban = df_urbanization_mode_urban[df_urbanization_mode_urban["Mode of Transport"] != 'Combined']
#pie chart to visualise the distance covered by mode of transport per year
years_to_visualize = [2018, 2019, 2020, 2021, 2022]
for year in years_to_visualize:
df_year = filtered_df_urbanization_mode_urban[filtered_df_urbanization_mode_urban['Year'] == year]
mode_distance = df_year.groupby('Mode of Transport')['Total Distance (billion km)'].sum()
plt.figure(figsize=(3, 3))
plt.pie(mode_distance, labels=mode_distance.index, autopct='%1.1f%%', startangle=140)
plt.title(f'Distance Covered by Mode of Transport in {year}')
plt.axis('equal')
plt.show()
# For the data for the Equivalised income groups
filtered_income_df_usage_of_public_transport = df_usage_of_public_transport[df_usage_of_public_transport['Demographic'].str.contains('Equ')]
fig = px.bar(
filtered_income_df_usage_of_public_transport,
x="Demographic",
y="Usage of public transportation (%)",
color='Demographic',
animation_frame="Year",
range_y=[0, 20],
title="Usage of Public Transportation Over Years",
labels={"Usage of public transportation (%)": "Usage (%)"},
)
fig.update_xaxes(categoryorder='total descending')
fig.show()
filtered_driver_license_df_usage_of_public_transport = df_usage_of_public_transport[df_usage_of_public_transport['Demographic'].str.contains('river')]
fig = px.line(
filtered_driver_license_df_usage_of_public_transport,
x="Year",
y="Usage of public transportation (%)",
color="Demographic",
title="Usage of Public Transportation Over Years by Driver License and Car Ownership",
labels={"Usage of public transportation (%)": "Usage (%)"},
markers=True
)
desired_years = [2018, 2019, 2020, 2021, 2022]
years = [str(year) for year in desired_years]
fig.update_xaxes(tickvals=years,ticktext=years)
fig.show()
sns.set_style("whitegrid")
plt.figure(figsize=(12, 6))
for column in df_data_traffic_highways.columns[1:]:
sns.lineplot(x="Week", y=column, data=df_data_traffic_highways, label=column)
plt.legend(loc="upper right")
plt.xlabel("Week")
plt.ylabel("Value (2019 = 100)")
plt.title("Traffic Data Over Weeks")
plt.show()